Search CORE

44 research outputs found

Reservoir of Diverse Adaptive Learners and Stacking Fast Hoeffding Drift Detection Methods for Evolving Data Streams

Author: Paquet Eric
Pesaranghader Ali
Viktor Herna
Publication venue: 'Springer Science and Business Media LLC'
Publication date: 07/09/2017
Field of study

The last decade has seen a surge of interest in adaptive learning algorithms for data stream classification, with applications ranging from predicting ozone level peaks, learning stock market indicators, to detecting computer security violations. In addition, a number of methods have been developed to detect concept drifts in these streams. Consider a scenario where we have a number of classifiers with diverse learning styles and different drift detectors. Intuitively, the current 'best' (classifier, detector) pair is application dependent and may change as a result of the stream evolution. Our research builds on this observation. We introduce the \mbox{Tornado} framework that implements a reservoir of diverse classifiers, together with a variety of drift detection algorithms. In our framework, all (classifier, detector) pairs proceed, in parallel, to construct models against the evolving data streams. At any point in time, we select the pair which currently yields the best performance. We further incorporate two novel stacking-based drift detection methods, namely the \mbox{FHDDMS} and \mbox{FHDDMS}_{add} approaches. The experimental evaluation confirms that the current 'best' (classifier, detector) pair is not only heavily dependent on the characteristics of the stream, but also that this selection evolves as the stream flows. Further, our \mbox{FHDDMS} variants detect concept drifts accurately in a timely fashion while outperforming the state-of-the-art.Comment: 42 pages, and 14 figure

arXiv.org e-Print Archive

NRC Publications Archive

An Exhaustive Shape-Based Approach for Proteins\u27 Secondary, Tertiary and Quaternary Structures Indexing, Retrieval and Docking

Author: Eric Paquet
Herna L. Viktor
Publication venue: 'IntechOpen'
Publication date: 20/04/2012
Field of study

IntechOpen

Faithful to Whom? Questioning Interpretability Measures in NLP

Author: Crothers Evan
Japkowicz Nathalie
Viktor Herna
Publication venue
Publication date: 13/08/2023
Field of study

A common approach to quantifying model interpretability is to calculate faithfulness metrics based on iteratively masking input tokens and measuring how much the predicted label changes as a result. However, we show that such metrics are generally not suitable for comparing the interpretability of different neural text classifiers as the response to masked inputs is highly model-specific. We demonstrate that iterative masking can produce large variation in faithfulness scores between comparable models, and show that masked samples are frequently outside the distribution seen during training. We further investigate the impact of adversarial attacks and adversarial training on faithfulness scores, and demonstrate the relevance of faithfulness measures for analyzing feature salience in text adversarial attacks. Our findings provide new insights into the limitations of current faithfulness metrics and key considerations to utilize them appropriately

arXiv.org e-Print Archive

Towards Ethical Content-Based Detection of Online Influence Campaigns

Author: Crothers Evan
Japkowicz Nathalie
Viktor Herna
Publication venue: 'Institute of Electrical and Electronics Engineers (IEEE)'
Publication date: 28/08/2019
Field of study

The detection of clandestine efforts to influence users in online communities is a challenging problem with significant active development. We demonstrate that features derived from the text of user comments are useful for identifying suspect activity, but lead to increased erroneous identifications when keywords over-represented in past influence campaigns are present. Drawing on research in native language identification (NLI), we use "named entity masking" (NEM) to create sentence features robust to this shortcoming, while maintaining comparable classification accuracy. We demonstrate that while NEM consistently reduces false positives when key named entities are mentioned, both masked and unmasked models exhibit increased false positive rates on English sentences by Russian native speakers, raising ethical considerations that should be addressed in future research.Comment: To appear in "Special Session on Machine learning for Knowledge Discovery in the Social Sciences" at IEEE Machine Learning for Signal Processing Workshop (MLSP) 201

arXiv.org e-Print Archive

Crossref

Enhancing Government Decision Making through Knowledge Discovery from Data

Author: Arndt Heidi
Oberholzer Mauritz
Viktor Herna
Publication venue: AIS Electronic Library (AISeL)
Publication date: 01/01/2000
Field of study

A major challenge facing management in developed countries is improving the performance of knowledge and service workers, i.e. the decision makers. In a developing country such as South Africa, with a welldeveloped business sector, the need to improve the performance of decision makers, especially in government, is even more crucial. South Africa has to face many new challenges in the 21st century - growing environmental concerns, massive social and economic inequalities, an ageing population, low productivity, massive unemployment and the nation\u27s evolving role in Africa. The importance of science and technology to address these pressing issues cannot be overemphasised. This paper discussed the development of a knowledge-base to aid government decision makers in interpreting the results of the National Research and Technology (NRT) Audit that was undertaken by the South African Department of Arts, Culture, Science and Technology. An intelligent data analysis tool is employed to construct a knowledge-base, using a data-driven rather than a knowledge-driven approach to knowledge-base con-struction. The knowledge-base is constructed directly from the data as contained in the NRT Audit data warehouse. The rules contained in the knowledge-base are produced by a team of data mining techniques that cooperate as members of a learning system. This knowledge-base is used to augment the knowledge of the human experts. Results show that the information, as discovered during the knowledge-base construction process, either enhanced or contradicted the finding of the human experts

AIS Electronic Library (AISeL)

Machine Generated Text: A Comprehensive Survey of Threat Models and Detection Methods

Author: Crothers Evan
Japkowicz Nathalie
Viktor Herna
Publication venue
Publication date: 15/02/2023
Field of study

Machine generated text is increasingly difficult to distinguish from human authored text. Powerful open-source models are freely available, and user-friendly tools that democratize access to generative models are proliferating. ChatGPT, which was released shortly after the first preprint of this survey, epitomizes these trends. The great potential of state-of-the-art natural language generation (NLG) systems is tempered by the multitude of avenues for abuse. Detection of machine generated text is a key countermeasure for reducing abuse of NLG models, with significant technical challenges and numerous open problems. We provide a survey that includes both 1) an extensive analysis of threat models posed by contemporary NLG systems, and 2) the most complete review of machine generated text detection methods to date. This survey places machine generated text within its cybersecurity and social context, and provides strong guidance for future work addressing the most critical threat models, and ensuring detection systems themselves demonstrate trustworthiness through fairness, robustness, and accountability.Comment: Manuscript submitted to ACM Special Session on Trustworthy AI. 2022/11/19 - Updated reference

arXiv.org e-Print Archive

Learning by cooperation : an approach to rule induction and knowledge fusion

Author: Viktor Herna Lydia
Publication venue: Stellenbosch : Stellenbosch University
Publication date: 01/01/1999
Field of study

Dissertation (Ph.D.) -- University of Stellenbosch, 1999.Full text to be digitised and attached to bibliographic record

Stellenbosch University SUNScholar Repository

A comparative study of distributed database recovery techniques

Author: Viktor Herna Lydia
Publication venue: Stellenbosch : Stellenbosch University
Publication date: 01/01/1992
Field of study

Thesis (M. Sc.) -- University of Stellenbosch, 1992.One copy microfiche.Full text to be digitised and attached to bibliographic record

Stellenbosch University SUNScholar Repository